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A voice measure of the speaker’s physiological state has 
unique applications in the aerospace environment. Unlike other 
physiological measures, a voice measure is unobtrusive and does 
not require attaching any equipment to the person being tested. 

It can be employed in cockpit and spacecraft settings without 
interfering with ongoing activity and, if used on radio- 
transmitted speech, might be employed without any additional 
equipment in the flight environment. A voice measure can also be 
used on recorded speech as, for example, in accident 
investigation to determine the relative stress levels of 
different statements by the flightcrew for information relevant 
to human performance issues in the investigation. For the 
purposes of this paper, the term "stress" is used to' mean changes 
in physiological state that result from changes in workload 
demands. , 

The aerospace community has been active in research on voice 
stress analysis (refs. 1 and E) . Although several aspects of the 
voice have been defined that appear to respond to psychological 
stress, it remains unclear from the research literature whether 
such voice changes are sufficiently robust to allow for practical 
assessment. Practical applications would probably require a 
single voice measure that is reliable across subjects and 
situations or, alternately, a battery of voice measures that 
could be applied to each individual subject and produce a 
reliable profile of that individual’s response to stress. 


Research reported in this paper was supported by the School of 
Aerospace Medicine, Brooks Air Force Base, Texas. It was 
executed at the Speech Research Laboratory, Veterans 
Administration Medical Center, San Francisco, California. 
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The present paper reports on a research program that is 
examining issues related to practical voice assessment. The 
first part of the program was to identify those candidate voice 
measures from the available research literature that displayed 
the greatest promise of responding to psychological stress 
changes. Eight such measures were identified. The second part 
of the program was to execute an original laboratory experiment 
that involved clear phys i o 1 og i ca 1 changes on the part of the 
subjects within the type of stress range that might be 
encountered in routine aerospace activity (as opposed to the 
higher stress range typically encountered in emergency situations 
from which much of the scientific voice information has been 
demonstrated ) . The experiment employed an av i a t i on- 1 i ke tracking 
task , varying both task difficulty and monetary incentives. 

The third part of the research program was to automate the 
eight candidate voice measures and compare their responses within 
the laboratory data to those of traditional physiological 
measures such as heart rate. This part of the research program 
is partially complete, with five of the candidate measures 
automated, and this paper reports the initial results of this 
effort . 


CANDIDATE VOICE MEASURES 


Eight candidate voice measures were determined that, it was 
believed, showed the greatest promise of responding to 
psychological stress. The choice of these measures was assisted 
by a comprehensive literature review completed recently for the 
Naval Air Test Center (ref. 1) and by the authors' familiarity 
with recent developments in the voice stress area. 

The eight candidate measures are 

1 ) Eyndamental^ tC^yuenc y i_t ch )_ . Under stress, there may be 

an increase in the fundamental frequency of the voice. 

Fundamental frequency, which may reflect the physical tension of 
the vocal muscles, is among the most frequently cited voice 
indices of stress. In emergency situations an increase in 
fundamental frequency may be universal (refs. 3, 4 and 5). 

2) Amplitude 1 loudness _> . Under stress, there may be an 
increase in the amplitude of the voice. This change would 
probably reflect an increased air flow through the lungs that 
often occurs under stress. 
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3) S&eech rate. Under stress, there may be an increase in 
speech rate. This change would be related to a general speeding 
up of* cognitive and motor processes that often appears under 
stress . 

4) ELflQuency lifter. Under stress, there may be a decrease 
in jitter of the voice fundamental frequency. Jitter is the 
minute variability which occurs in the spacing of the fundamental 
frequency periods (when measured on a cycle-by-cycle basis). It 
represents a subtle aspect of audible speech that can be 
difficult to measure precisely (ref. 6). Lieberman (ref. 7) 
proposed that jitter decreases in response to psychological 
stress, and there is recent supporting evidence (ref. 3). 

5) Amplitude shimmer . Under stress, there may to be a 
decrease in shimmer of voice amplitude. Shimmer is the cycle— by- 
cycle variability in the amplitude pattern (and is the equivalent 
measure to amplitude that jitter is to frequency). Although no 
literature relates shimmer to psychological stress, it seems 
reasonable from theoretical considerations that it might follow a 
pattern similar to that of jitter. 

6 ) PSE scores. Under stress, there may be an increase in 
scores determined from the Psychological Stress Evaluator (PSE). 
The PSE is the best-researched of a series of commercial voice 
devices sold for lie detection. There is substantial evidence 
that the PSE is not valid for lie detection (refs. 8 and 9), a 
questionable application for any stress measure that requires 
subjective determinations by the person administering the test to 
infer the presence of lying (ref. 10) . However , there is also 
evidence that the PSE-derived scores may respond to simple 
manipulations of stress (refs. 11 and 12). 

7) Energy distribution. Under stress, there may be an 
increase in the proportion of speech energy between 500 and 1000 
Hz. Scherer (refs. 13 and 14) provides evidence for this effect. 

8) Derived measure. Under stress, there may be a reliable 

increase in a derived measure that statistically combines other 
measures described above. This approach has been advanced by 
Brenner (ref. 15), who uses the "improper linear model" of Dawes 
(ref. 16) to provide a simple statistical combination of 
component speech measures. In theory, the derived measure should 
then reflect any unusual changes within the same speaker’s voice 
on one or many component measures. In a recent judicial 
decision, in the legal case of Hgpp_ie/G22_l_i© v^ 8 9 such an 

approach to voice stress analysis was judged to provide 
admissible evidence (refs. 17 and 10). 
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LABORATORY EXPERIMENT 


An experiment was designed that, it was hoped, would provide 
clear physiological differences within the subjects tested. The 
experiment employed the tracking task of Jex, McDonnell 6. Phatak 
(ref. 19), a highly motivating task requiring good reaction time 
that has been employed extensively in aerospace research. This 
task can be varied over a wide range of difficulties, and 
previous literature has suggested physiological changes in 
response to task loading on measures drawn from heart, 
respiration, and EMG data (refs. 20 and 21). For the present 
experiment, monetary incentives were used along with task loading 
to help guarantee a clear physiological response. 

Heart data were obtained from the subjects during the 
experiment, and excellent voice recordings were obtained of the 
spoken responses in digital format. Preliminary results 
available at this time indicate a clear direction for the voice 
measures that have been tested. 


Sub iec ts 

Seventeen males, ranging in age from 21 to 35 years old, 
served as subjects. They were paid $50 plus any monetary 
incentives won during the experiment. 

Procedure 

The experiment employed the tracking task of Jex, McDonnell 
& Phatak (ref. 19) implemented on the Commodore 64 computer. In 
this task the subject is seated at a CRT display with a manual 
joystick and attempts to keep a computer-generated triangle at 
the center of the screen. The triangle moves left and right 
horizontally in an unpredictable pattern until it touches a left 
or right boundary on the screen and the trial ends (giving the 
subject a task similar to balancing a broomstick on a fingertip). 
A numerical value, the Lambda score, quantifies the mathematical 
unpredictabi 1 i ty of the triangle’s gyrations. 
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Each subject participated at two sessions. At Session 1, 
the subject was seated in a practice room and trained on the 
tracking task (25 trials* 10 minute break* 25 trials* ten minute 
break). At this time subjects performed the "critical" form of 
the task, in which Lambda was shown on the screen and increased 
progressively during the trial. The subject attempted to achieve 
as high a Lambda score as possible before the triangle went out 
of bounds. To provide speech data, subjects counted aloud on 
half of the trials. Every ten seconds during the trial, 
following a computer-generated cueing tone, subjects counted 
aloud from 90 to 100 as quickly as possible. The counting task 
was chosen because it causes minimal interference with the 
tracking task, and the numbers 90 to 100 were chosen because they 
provide an excellent acoustic pattern with almost continuous 
voicing . 

Following this training, the subject was seated in the 
laboratory and attached to data recording equipment. Heart rate 
data were recorded on a multi-channel FM recorder via 
s i 1 ver / s i 1 ver chloride electrode monitors attached to the right 
and left upper rib areas and base of the neck (the ground 
electrode). Speech data were recorded via a 1" condensor 
microphone contained in a custom— mod if i ed rubber anaesthesia mask 
worn by the subject. Speech data were captured digitally in 
real— tirtie on a laboratory computer at a sampling rate of 10 kHz 
(the rubber mask also contained a pneumotachograph to measure 
respiration, and data from this measure are to be described in 
future papers). 

Following a warmup period (ten trials of the "critical" 
task) , subjects performed the "sub— cr i t ica 1 " form of the tracking 
task. In this form the Lambda score, not shown on the screen, 

Mas fixed at a specific level of difficulty. The subject’s task 
was to keep the triangle centered for as long as possible up to 
ninety seconds. On some trials the Lambda score was "easy" 
(Lambda = 0.9), on some trials "difficult" (Lambda = 90*/. of the 
subject’s best practice score, median of five trials), and on 
some trials "moderate" (Lambda = 75*/. of the subject’s best 
practice score) . Each subject performed two trials at each 
difficulty level. Finally, the subject rested for fifteen 
minutes, provided baseline measures, and was dismissed. The 
purpose of Session 1 was training and familiarization, and none 
of the data collected at Session 1 were analyzed. 
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At Session £, several days later, the subject was again 
seated in the laboratory and attached to data recording 
equipment. The subject performed a warmup procedure (ten trials 
of the "critical 1 ' task). The subject then performed several 
trials of the " subcr i t i ca 1 " task, both "easy" and "difficult", 
and these trials represent the principal source of data for the 
experiment. For these trials, the subject was offered monetary 
bonuses. On easy trials (Lambda = 0.9) the subject was offered 
two if be could complete a successful ninety second trial 

within two attempts. All subjects performed perfectly on the 
first attempt. On difficult trials (Lambda = 90% of best 
practice score) the subject was offered fifty dollars if he could 
complete a successful ninety-second trial within two attempts. 
Those subjects who failed at this bonus were offered forty-five 
dollars and two attempts to complete a slightly less difficult 
task (Lambda = 85% of best score). All subjects succeeded by the 
end of this second bonus (median Lambda value = 4.E). The order 
of easy and difficult presentations was counterbalanced across 
sub jec ts . 


To complete Session E, the subject rested for fifteen 
minutes and provided baseline measures. The subject was 
debriefed, paid, and dismissed. 


Data Reduction 

An automated program was prepared for data reduction related 
to five of the automated speech measures. The extraction of 
these parameters was based on algorithms and software developed 
by E. Thomas Doherty, Ph.D., of the Speech Research Laboratory, 
Veterans Administration Research Laboratory, San Francisco, 
California. Dr. Doherty also served as a consultant on this 
project, and technical details of the analysis program will be 
provided in other reports. 

The automated program inputs recorded speech at slow speed, 
segmenting it into speech periods and removing the silent periods 
between syllables and words. The program outputs automated 
measures for five of the candidate speech measures: fundamental 

frequency, amplitude, speech rate (ie. total time to speak the 
ten numbers), jitter, and shimmer. 
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Results 

Data analysis applied to three trials from Session E for 
each subject: the successful ,, difficult M trial on which the 

subject won *50 or *45; the successful "easy" trial on which the 
subject won *E; and a baseline trial on which the subject simply 
counted. Speech data on each trial consisted of nine 
repetitions of the numbers 90 to 100. 

Figure 1 displays heart rate data. Average heart rate was 
83 bpm on the baseline trial, B8 bpm on the easy trial, and 100 
bpm on the difficult trial (F (E/3E) = EE.l, p<.001). An 
analysis-of-var iance test proved highly significant for the 
overall difference between difficult and easy (F (1/3E) = Sl.E, 
p<-001), and 16 of the 17 subjects showed a higher average heart 
rate on the difficult treatment than on the easy treatment (sign 
test: p < . 00 1 ) . Based on the heart rate data, then, the 

experiment produced a clear physiological response against which 
the voice measures can be compared. 

Speech data are summarized in Tables 1 and E and in Figures 
E, 3, and 4. The ana 1 ys i s-of-var i ance values reported in Tables 1 
and S are for differences between the treatment means (a more 
complete analysis, treatment x time, has not been completed). 

The second column of Table E (“Number of subjects with predicted 
effect") represents a sign test. 

Amplitude displayed a highly significant relation to the 
task and, as shown in Figure E, provided a pattern resembling 
that of heart rate. Average amplitude increased between the easy 
and difficult treatments by a magnitude of about 0.07 volts, a 
change that was clearly measurable but that would be virtually 
impossible to recognize in normal conversation. Fundamental 
frequency also increased in response to the task, providing a 
pattern of results less robust than that of amplitude. Average 
fundamental frequency varied between the easy and difficult 
treatments by a magnitude of about E Hz . , a change that is also 
negligible in normal conversation. 

The speech rate measure provided a marginally significant 
discrimination of the three treatments. Speech rate also showed 
the highest consistency across subjects of any of the speech 
measures . 


369 



VOICE STRESS ANALYSIS 


The jitter measure responded in the predicted direction, but 
to a marginal degree that produced little statistical effect. 

This measure is of theoretical interest but, pending the results 
of a complete analysis, does not appear to respond to the type of 
stress present on this task. Shimmer also responded with 
marginal effect, but showed a consistency across subjects that 
suggests a need for further study. 


CONCLUSIONS 


Previous literature has reported increases in fundamental 
frequency, amplitude, and speech rate in the voices of speakers 
involved in extreme levels of stress (refs. 3, 4, and 5) (and 
these changes are among the major components of screaming) . What 
seems remarkable about the present results is that the same 
changes appear to occur in a regular fashion within a more subtle 
level of stress that may be characteristic, for example, of 
routine flying situations. This evidence adds confidence that 
these changes reflect some valid underlying physiological 
response of the human speech system. 

The results of our experiment replicate exactly those 
reported recently by Griffin & Williams (ref. 22). Working in an 
aircraft simulator setting, they found that increases in speech 
amplitude, fundamental frequency, and speech rate appeared in the 
subjects 7 speech in response to increased workload demands. The 
combined evidence of the experiments helps establish these three 
voice measures as parameters for aerospace applications. 

In our research, none of the individual speech measures 
performed as robustly as did heart rate. An area of active 
future interest is to develop a single derived speech measure, 
drawing information from several component speech aspects, and to 
compare the performance of this measure with that of a measure 
such as heart rate. Another area of future interest is the 
possibility of developing a convenient and even real-time 
assessment technique, especially given the current explosion in 
automated speech processing technology. Voice stress analysis is 
maturing as a research area, and we urge our colleagues to 
consider voice response in their thinking about mental-state 
es t i mat ion. 
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Table 1. Differences between the treatment means for the "five 
voice measures (analysis of variance). 


F (2/32) 


Fundamental Frequency /.!** 

Amplitude 10.2*** 

Speech Rate 3.1* 

Jitter °- 1 

Sh immer * • ^ 


* p < . 10 
** p <.01 
**# p <.001 
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Table 2. Differences between the easy and difficult treatment 
means for the five voice measures (analysis of variance/sign 
test ) . 


F (1/32) Number of subjects 

with predicted effect 


Fundamental Frequency 

2.9* 

10/17 

Amp 1 i tude 

5.0** 

13/17** 

Speech Rate 

2.5 

14/17*** 

J i t ter 

0.1 

9/17 

Shimmer 

0.7 

13/17** 


* p<.10 

** p < . 05 

*** p< .005 
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